Goto

Collaborating Authors

 pulley system


LLM world models are mental: Output layer evidence of brittle world model use in LLM mechanical reasoning

arXiv.org Artificial Intelligence

Do large language models (LLMs) construct and manipulate internal world models, or do they rely solely on statistical associations represented as output layer token probabilities? We adapt cognitive science methodologies from human mental models research to test LLMs on pulley system problems using TikZ-rendered stimuli. Study 1 examines whether LLMs can estimate mechanical advantage (MA). State-of-the-art models performed marginally but significantly above chance, and their estimates correlated significantly with ground-truth MA. Significant correlations between number of pulleys and model estimates suggest that models employed a pulley counting heuristic, without necessarily simulating pulley systems to derive precise values. Study 2 tested this by probing whether LLMs represent global features crucial to MA estimation. Models evaluated a functionally connected pulley system against a fake system with randomly placed components. Without explicit cues, models identified the functional system as having greater MA with F1=0.8, suggesting LLMs could represent systems well enough to differentiate jumbled from functional systems. Study 3 built on this by asking LLMs to compare functional systems with matched systems which were connected up but which transferred no force to the weight; LLMs identified the functional system with F1=0.46, suggesting random guessing. Insofar as they may generalize, these findings are compatible with the notion that LLMs manipulate internal world models, sufficient to exploit statistical associations between pulley count and MA (Study 1), and to approximately represent system components' spatial relations (Study 2). However, they may lack the facility to reason over nuanced structural connectivity (Study 3). We conclude by advocating the utility of cognitive scientific methods to evaluate the world-modeling capacities of artificial intelligence systems.


Probing Mechanical Reasoning in Large Vision Language Models

arXiv.org Artificial Intelligence

Mechanical reasoning is a fundamental ability that sets human intelligence apart from other animal intelligence. Mechanical reasoning allows us to design tools, build bridges and canals, and construct houses which set the foundation of human civilization. Embedding machines with such ability is an important step towards building human-level artificial intelligence. Recently, Li et al. built CogDevelop2K, a data-intensive cognitive experiment benchmark for assaying the developmental trajectory of machine intelligence (Li et al., 2024). Here, to investigate mechanical reasoning in Vision Language Models, we leverage the MechBench of CogDevelop2K, which contains approximately 150 cognitive experiments, to test understanding of mechanical system stability, gears and pulley systems, seesaw-like systems and leverage principle, inertia and motion, and other fluid-related systems in Large Vision Language Models. We observe diverse yet consistent behaviors over these aspects in VLMs.


Personal Mobility With Synchronous Trunk-Knee Passive Exoskeleton: Optimizing Human-Robot Energy Transfer

arXiv.org Artificial Intelligence

We present a personal mobility device for lower-body impaired users through a light-weighted exoskeleton on wheels. On its core, a novel passive exoskeleton provides postural transition leveraging natural body postures with support to the trunk on sit-to-stand and stand-to-sit (STS) transitions by a single gas spring as an energy storage unit. We propose a direction-dependent coupling of knees and hip joints through a double-pulley wire system, transferring energy from the torso motion towards balancing the moment load at the knee joint actuator. Herewith, the exoskeleton maximizes energy transfer and the naturalness of the user's movement. We introduce an embodied user interface for hands-free navigation through a torso pressure sensing with minimal trunk rotations, resulting on average $19^{\circ} \pm 13^{\circ}$ on six unimpaired users. We evaluated the design for STS assistance on 11 unimpaired users observing motions and muscle activity during the transitions. Results comparing assisted and unassisted STS transitions validated a significant reduction (up to $68\%$ $p<0.01$) at the involved muscle groups. Moreover, we showed it feasible through natural torso leaning movements of $+12^{\circ}\pm 6.5^{\circ}$ and $- 13.7^{\circ} \pm 6.1^{\circ}$ for standing and sitting, respectively. Passive postural transition assistance warrants further work on increasing its applicability and broadening the user population.